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Abstract. Fundamental theories and practical methods for large-scale electronic 
structure calculations are given, in which the computational cost is proportional to 
the system size. Accuracy controlling methods for microscopic freedoms are focused 
on two practical solver methods, Krylov-subspace method and generalized- Wannier- 
state method. A general theory called the 'multi-solver' scheme is also formulated, as a 
hybrid between different solver methods. Practical examples are carried out in several 
insulating and metallic systems with 10 3 -10 5 atoms. All the theories provide general 
guiding principles of constructing an optimal calculation for simulating nanostructure 
processes, since a nanostructured system consists of several competitive regions, such as 
bulk and surface regions, and the simulation is designed to reproduce the competition 
with an optimal computational cost. 
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1. Introduction 

Electronic structure theory plays a crucial role in understanding and controlling 
nanostructures, structures in nano-meter and ten-nano-meter scales. Dynamical 
simulation in these scales is, however, impractical for the present standard methodology, 
such as the Car-Parrinello method 0, owing to its heavy computational cost. From 
1990's, many calculation methods and related techniques have been proposed for large 
systems, systems with thousands of atoms or more, by calculating one-body density 
matrix or the Green's function, instead of one-electron eigenstates. [2131111001111313 
HOI HH C21 Uni HH 113 Uni HZl HHJ Uni In these methodologies, calculation is carried out 
with real-space representation and a physical quantity (X) is given as a trace form 

(X)=Tr\pX] = J I drdr'p{r,r')X(r',r). (1) 

Here the one-body density matrix p is defined, from occupied one-electron eigenstates 
4>k(r), as 

occ. 

P-El4 eis) )(0i eig) l- (2) 
k 

One can find that, if the matrix X(r, r') is of short range, the off-diagonal long- 
range component of the density matrix does not contribute to the physical quantity 
(X), which is important for practical success of large-scale calculations. [2 Actual 
calculation methods and their applications are found in recent reviews [3 E] or 
papers. El 13 El 13 HDl HU H21 113 HH HHJ H3 HZI CHI IIHI A set of theories and program 
codes have been developed in our group and a test calculation of Fig. [T] shows that 
the computational cost is 'order-iV' or proportional to the system size (N) among the 
calculations with 10 3 -10 7 atoms [13 El 00 123 EH! • 

A practical success in an application study always requires the balance between the 
accuracy and the computational cost. Every calculation method has several controlling 
parameters and one should establish a systematic way of setting them in optimal values. 
Here we remember that a nanostructure is composed of several comparable regions with 
essential difference in electronic structure, such as bulk and surface regions. Since 
the competition of these regions gives various structural and functional properties 
of nanostructures, the requirement on dynamical simulation of a nanostructure is to 
reproduce the competition, or to reproduce the difference in electronic structure among 
the regions, throughout the process. 

In this paper, we will show how to construct an optimal calculation scheme for 
nanostructure process. The essential concepts are (i) controlling method of the accuracy 
and the computational cost by monitoring residuals for microscopic or basis freedoms 
and (ii) choice or combination of different calculation methods. Hereafter the word 
'solver method' is used as a practical calculation method of density matrix p with a 
given Hamiltonian H. 

This paper is organized as follows; In Sec. 121 we will explain the foundation of 
two methods, Krylov subspace method and generalized Wannier state method. They 
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are practical solver methods to calculate the density matrix for a given system and 
we will compare them, in Sec. 12.31 from a practical view point. In Sec. H3 we will 
construct a methodology of ' mult i- solver' scheme, as a hybrid or combination of different 
solver methods. Several applications as molecular dynamics (MD) simulations will be 
presented in Sec. IH so as to clarify the methodological points. In the present paper, 
we limit the formulations into those for a Hamiltonian if as a real-symmetric matrix. 
Practical calculations were carried out with Hamiltonians in the Slater-Koster (tight- 
binding) form; The Hamiltonian for fee Cu is constructed from the first-order form H^ l > 
of the linear muffin-tin orbital theory [21] and those for C and Si are typical ones in 
Ref. [22 and Ref. [23], respectively. 




Figure 1. The computational time as a function of the number of atoms (N) 
(Rcfs. ^1 Eli this work); Several metallic (fee Cu and liquid C) and insulating 
(bulk Si) systems are calculated up to 11,315,021 atoms . The time was measured for 
the electronic structure calculation with a given atomic structure. The calculations 
were carried out by the conventional eigenstate calculation (EIG) and by our 
methods for large systems; (i) Krylov-subspace method with subspace-diagonalization 
procedure (KR-SD), (ii) Wannier-state method with variational procedure (WS-VR) 
and (iii) Wannier-state method with perturbative procedure (WS-PT). For '1CPU' 
computations, we used single Pentium 4™ processor in 2 GHz. Parallel computations 
were carried out by SGI Origin 3800™ (for WS-PT method), Origin 2800™ (for 
WS-VR method) and Altix 3700™ (for KR-SD method). See text for details. 



2. Theory (1) Practical solver methods 

2.1. Solver methods with Krylov subspace 

Krylov subspace is a general mathematical concept defined as the linear space of 

K v (H,\j)) = Bps*{\j), H\j), H 2 \j), H»- l \j)}. (3) 

Here the 'starting' vector and the dimension of the subspace (u) are arbitrary. Many 
iterative methods, such as the standard conjugate-gradient method, are formulated with 
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Krylov subspace. See a recent textbook |2I|, for example. In the present context, the 
matrix if is a Hamiltonian and \j) is a real space basis. A large-scale calculation 
can be realized, when the density matrix (i\p\j) is constructed within the Krylov 
subspace fC v (H, The Krylov subspace method enables us also to calculate the 

Green's function (i\G\j), which gives directly the information of electronic states, such 
as the density of states (DOS). When the dimension v is equal to that of the original 
Hamiltonian matrix H, the linear space of Eq. Q is complete and all the calculation 
results are exact. [21] 

2.1.1. Subspace- diagonalization method Here we explain a practical solver method 
with Krylov subspace, called 'subspace-diagonalization method' (KR-SD) [TB] : First, we 
construct an orthogonal basis set {\K^)} for the Krylov subspace ((K^\K^J) = 5 nm ); 

1C V (H, = span{|tf?>) = \j), \K!p), ••, \K^)} (4) 

by the Lanczos procedure, a three-term recurrence formula. The n-th basis \K^) is 
constructed in the n-dimensional Krylov subspace (\K%>) E JC n (H, In result, a 

reduced Hamiltonian matrix 

(H K ^) nm = (K^\H^\K^) (5) 

is obtained as an explicit {y x v) matrix. A typical subspace dimension is v = 30 in MD 
simulations. Then, we diagonalize the reduced (small) matrix 

H K ^\ V M) = eg ) |i# ) >, (6) 

with a negligible computational cost. The resultant eigen vectors \v%>) are described as 
the set of coefficients = (K$\v^). 
The density matrix is obtained by 

(i\p\j) ^ (i\p K{j) \j) (7) 

= tm^)(K^\ P K ^\j), (8) 

n 

with the definition of 

p«<fi = ^\v®)f r (eU> - Mv<p\. (9) 

a 

Here the occupation number f T (e — fi) is given by the Fermi-Dirac function with a 
temperature (level-broadening) parameter r and the chemical potential fi. The chemical 
potential is determined by the bisection method. The Green's function (i\G(z)\j) can 
be calculated in a similar manner. In short, the present procedure is a standard 
quantum-mechanical calculation for eigen states, except the point that the calculation 
is carried out within the Krylov subspace. Therefore it is straightforward to apply the 
method to calculations with a nonorthogonal basis set, in which a generalized eigen- value 
equation, instead of Eq. (JHJ), is solved within the subspace. 
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2.1.2. Shifted conjugate- orthogonal conjugate- gradient method Another solver method 
with Krylov subspace was formulated and called 'shifted conjugate-orthogonal 
conjugate- gradient' (SCOCG) method. Its foundation is given by a mathematical 
theorem proved recently. [23] The practical procedure is based on an iterative solver 
method for the linear equation of the Green's function; 

(z-H)\ Xj ) = \j), (10) 

because of Gij = (i\xj) = (i\(z — The density matrix is obtained by 

1 f°° 

pij = — / lmGij(e + i0) f T (e - fi) de. (11) 



7T 

The SCOCG method and KR-SD method share many common features but are different 
in the numerical treatment. See the original paper ^Hj for detailed comparison. For the 
present time, we use, mainly, the KR-SD method for MD simulation and we think that 
the SCOCG method is suitable to discuss a very fine energy spectrum of the Green's 
function. [TH] 

2.1.3. Accuracy control with residual So as to monitor the accuracy during the 
simulation, we calculate a residual vector of the Green's function; ^§1 

\ 6Gj ) = (z-H)G\j)-\j). (12) 

The residual vector is defined individually among the basis suffix j. We observed that the 
required subspace dimension v = i/w) for a given criteria on the residual vector is different 
among surface and bulk regions. JH] Such a determination of the controlling parameters 
{v^'} is an example of the accuracy control for microscopic or basis freedoms. 

2.2. Solver methods with generalized Wannier state 

Another method for obtaining the density matrix in large systems is formulated using 
generalized Wannier state. |2H1 I2ZI El IZl I2HJ HU EOI A physical picture of the 
generalized Wannier states is localized chemical wave function in condensed matters, 
such as a bonding orbital or a lone-pair orbital with a slight spatial extension or 
'tail'. 13 ESI UH EH do] Their wavefunctions {</>j WS ^} are equivalent to the unitary 
transformation of occupied eigen states and satisfy the equation of 

occ 

W, <WS, )=I>d4 WS, >. (13) 
i=i 

where the matrix is the Lagrange multipliers for the orthogonality constraint 
((</>i W \<$f ) = Sij). The suffix i of a wavefunction i - WS ' ) indicates the position of 
its localization center, such as bond site. Since Wannier states give the density matrix 
p in Eq. (J2J) by replacing eigen states {(f)^ 1 ^} into Wannier states {^f^}, any physical 
quantity can be reproduced in the trace form of Eq. (JTJ). 

Our practical solver methods are based on a mapped eigen- value equation [TTJ [3T] 
that is equivalent to Eq. (JTHJ); 

rAi) u( ws )\ _ r B u( ws )\ (1A\ 
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where 

H% = H + 2rhPi - Hpi - p t H (15) 

occ. 

p-^p-i0r s) )(0S ws) i=Ei^ ws) )(^ ws) i- (is) 

The energy parameter r] s should be much larger than the highest occupied level. 
Equation (fPfj) was derived in Refs. EH] and will be derived again, from a different 
theoretical background, in Sec. 13. II of this paper. 

2.2.1. Variational procedure in Wannier- state method Equation (fT4*j) gives a practical 
iterative procedure to generate Wannier states under explicit localized constraint, 
[TTJ [T3J [T7J EI] which is called variational Wannier state method. See papers [TTJ EI] for 
details. Residual vector for each wavefunction |</>j WS ^) 

i^ ws ^^ s i0r s) )-^ s i^ ws) )- (17) 

is monitored for each Wannier state during the simulation, so as to control the accuracy, 
which realizes the accuracy control for microscopic or basis freedoms, as discussed in 
the Krylov-subspace method with Eq. (|12j). A practical success in the Wannier-state 
method is realized, when all or a dominant number of wavefunctions are well localized. 
Examples and technical details are given in Sec. 14.21 and references |11| EE 121] • 

2.2.2. Perturbative procedure in Wannier-state method We developed also a 
perturbative method, [HI EHl E21 EI] in which a perturbation solution of Eq. (JUJ) 
is constructed for each Wannier state |0 4 - WS ' ) ); 

i0S ws) )^^(i0! ws)(o) )+i^ (ws)(i) ))- as) 

Here ) and \4>i ) are the unperturbed and first-order perturbation terms, 

respectively, and the factor Cj is the normalization factor. The unperturbed term 
|0(ws)(o)^ g^Q^^ k e p re p arec [ as an input quantity and the perturbation term |^ WS ) 
and the normalization factor Ci are determined automatically by the standard first- 
order perturbation procedure. [HI 123 E21 EI] During a simulation, the weight of the 
unperturbed term 

li^W^l 2 (19) 

is monitored, for each wavefunction, as an accuracy control for microscopic or basis 
freedoms. In silicon crystal, for example, the ideally sp 3 -bonding wavefunction is 
chosen as the unperturbed term and the weight of the unperturbed term is dominant 
(wq^ = 0.94) [TU |22l EI], which validates the perturbative treatment. When the 
perturbative method is validated, its computational performance is faster than that 
of the variational method, since the perturbative method gives a simpler procedure to 
generate the wavefunctions and does not require any iteration loop. 
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2.3. Comparison between Krylov- sub space and Wannier-state methods 

When the Wannier-state methods are compared with the Krylov-subspace methods, 
the Wannier-state methods require an initial guess of wavefunctions in the variational 
(iterative) method or an unperturbed term of wavefunction in the perturbative method. 
As an example, the reconstruction on Si(OOl) surface was calculated with the force on 
atoms. The calculation was carried out by the two Krylov-subspace methods, (i) the 
subspace diagonalization procedure |IB| and (ii) the SCOCG procedure [TH] . and (iii) 
the variational Wannier-state method. [31] In the variational Wannier-state method, the 
initial guess of the wavefunctions are prepared to be the lone-pair state of (\s) + \p z ))/ V% 
for surface states and to be the sp 3 -bonding states for other (bulk) states. The three 
methods reproduce the energy differences satisfactorily among the (2 x 1), (2 x 2), and 
(4x2) surfaces, when these results are compared with those of the eigen-state calculation 
with the present Hamiltonian [33] arid the standard ab initio calculation [34J. 

The perturbative Wannier-state method is much limited in its applicability than 
the above three methods, because the unperturbed term should be prepared as a good 
approximation (|0i WS ^) ~ |0- WS ^°' ) ) or Wq ~ 1). So far we have applied the perturbative 
Wannier-state method only to the bulk (sp 3 -bonding) wavefunction in the diamond- 
structure solids without deformation or with small (elastic) deformation. [TTJ HH] 
Since the first-order perturbation form was used for the wavefunction |0i WS ^) in these 
cases, the calculated energy e- WS ' ) = (0^ WS ^ |if|0- WS ^) is correct within the second order 
with respect to deformation and the elastic constants are well reproduced. We should 
say, however, that a drastic change of wavefunction, like that in a bond-breaking process, 
is not reproduced by the perturbative Wannier-state method, if the bulk (sp 3 -bonding) 
wavefunction is chosen as the unperturbed term. 

Despite the limitations, the computational performance of the Wannier-state 
methods is faster, at best by several hundred times, than that of the Krylov-subspace 
method, if it is applicable. In Fig. Q for example, the Wannier-state method with the 
perturbative procedure (WS-PT) using single CPU is faster than the Krylov-subspace 
method with the subspace-diagonalization procedure (KR-SD) using 64 CPUs. 

When one thinks about a guiding principle for how to choose a solver method in 
an application study, the above discussion suggests that the Wannier-state methods 
give a faster performance, when the input wavefunctions are near the final solutions 
and, particularly, they are well localized. In other cases, the Krylov-subspace method 
is preferable, since the Krylov-subspace method does not require any input quantity for 
electronic states. 

3. Theory (2) Multi-solver scheme 

3.1. Formulation 

As another fundamental methodology for large-scale calculations, we developed a 'multi- 
solver' scheme |15| I3T] , as hybrid or combination of two different solver methods. Its 
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basis idea is that the density matrix is decomposed into two parts called 'subsystems' 
and they are given by different solver methods. As discussed below, the multi-solver 
scheme will be fruitful, particularly, when the simulation system is composed of different 
regions, such as bulk and surface regions, and different solver methods are used in these 
different regions. 

The mathematical foundation of the multi-solver scheme is based on the 
commutation relation of the density matrix; 

[H, p] = 0. (20) 

When the occupied one-electron states, eigenstates or Wannier states, are classified into 
two groups A and B, the density matrix is decomposed into the corresponding two parts 

P = Pa + Pb (21) 



where 



occ.(A) 

p A = \<f>i)(U (22) 



occ.(B) 

PB = E l^'X^'l = P-PA- (23) 

3 

Here we call pa and ps 'subsystems' and the two subsystems are orthogonal 

PaPb = 0, (24) 
owing to the orthogonality relation 

(&|&) = 0> ^gA^-gB. (25) 
If the subsystems Pa, Pb are defined from eigenstates, a mapped Hamiltonian 

H^ = H + 2r] s p B , (26) 
with a scalar i] s , satisfies the commutation relation 

[HM, PA ] = (27) 
owing to Eq. (j2IJ) and 

[H,p a ]=Q (a = A, B). (28) 

We call the scalar t] s 'energy-shift parameter'. If ps is given, the problem for obtaining p& 
is reduced to a standard quantum mechanical problem with the well-defined Hamiltonian 
H^J p . In practical calculations, the energy shift parameter is chosen to be so large that 
the states in p B do not lie in the occupied energy region of H^ p . Note that Eqs. (j27| . 
(f2~%|) are satisfied, even if the subsystems Pa, Pb are constructed by eigen states with 
fractional occupancy. 

If the subsystems Pa, Pb are defined from Wannier states, on the other hand, 
Eq. (|2*7|) are not satisfied, because Eq. (J2~%|) is not satisfied. Then we redefine the 
mapped Hamiltonian as 

#i A a ) p = H + 2 VsPB - Hp B - p B H, (29) 
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which satisfies Eq. (|27j) in the cases of eigenstates and Wannier states. See | Appendix A 
for proof. A simplest case is that the subsystem pA is consist of only one Wannier 
state 0j- WS ' ) . In this case, the mapped Hamiltonian in Eq. (J2H|) is reduced to that in 
of Eq. dHJ), because of pa I^F^K^F^ I an d Pb =^ Pi- In other words, the present 
theory gives another derivation of Eq. (jl4j) . the mapped equation of Wannier state. 
From a practical view point, the term {Hps + PbH) in Eq. (J29"j) can be ignored, when 
the energy shift parameter r) B is so large (r) B — > +00) that the energy band of pb is well 
separated, energetically, from that of pa- If the term is ignored, the mapped Hamiltonian 
is reduced to the form of Eq. . 




Figure 2. Example of the multi-solver scheme in a silicon slab with ideal (001) 
surface; (a)(b)The electron population per atom is plotted as the function of atomic 
layer. The atoms at z — correspond to the surface atoms. The calculation is carried 
out by the conventional eigen-state calculation (n eX act) and the present multi-solver 
scheme (jia and ub)- (c)(d) DOS in the multi-solver scheme. Lower panel : DOS of 
the original Hamiltonian H. Upper panel : DOS of the mapped Hamiltonian H^&p- 



3.2. Example 1 

Hereafter, the multi-solver scheme will be demonstrated. Although the formulation of 
the multi-solver scheme is general, we have used, so far, the scheme only with the 
perturbative Wannier-state method for a subsystem (ps)- Among these cases, the 
subsystem p# is determined in the first-order perturbation form, and then the other 
subsystem pa is determined, through the mapped Hamiltonian , by a different 
solver method (p# =>- =^ pa)- In other words, the present procedure does not 

contain a self-consistent loop (p# =>■ =>■ p A => H^J p =^ p B ■ -). A related general 

discussion will be given in Sec. 14.31 

The first example is a Si slab with ideal (001) surface, in which we use the eigen- 
state method for pa and the perturbative Wannier-state method for p#. Each atomic 
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layer contains 64 atoms and the total number of atoms is 64 x 16 = 1024 in the periodic 
simulation cell. Since an ideal (001) surface gives an almost zero energy gap (0.025 eV), 
the present example is one of the severest tests for the present methodology. The z 
coordinate is written in the unit of atomic layer (z = 0, 1,2. ...15). The surface atoms 
are located at z = and have dangling-bond electrons. The atoms in the opposite 
surface (z = 15) are terminated by the bulk (sp 3 -bonded) Wannier states and do not 
have any dangling-bond electrons. The z coordinate of bulk-bond sites can be described 
as half integers (z = 0.5, 1.5, 2.5, ....14.5). In the multi-solver scheme, the subsystem ps 
is constructed from the Wannier states whose localization (bond) centers are located 
deeper than the eighth atomic layer (z = 8.5,9.5,..). The rest system is assigned to 
the subsystem pa that contains the surface states. The wavefunctions 0j- WS ' 1 in p B are 
determined by the perturbation form and, then, pA is determined by diagonalizing the 
mapped Hamiltonian . The energy shift parameter is chosen as 2r] s = la.u. (~ 
27.2 eV). 

In Fig. Efa), the electron populations of the subsystems, ua(z) and n B (z), are 
plotted as the function of the atomic coordinate z. The total electron population in 
the multi-solver scheme (ua + «b) reproduces the exact one n ex act(^)- As a remarkable 
result, the population at z = 8 is contributed by both of the subsystem pa and ps 
with an almost equal weight, since the Wannier states located at z = 7.5 and those at 
z = 8.5 belong to pa and p B , respectively. Figure Efb) shows that ua(z) decays quickly 
at z > 8, because of the nature of the mapped Hamiltonian . 

So as to understand the multi-solver scheme, Figs. Efc)(d) show the DOS of the 
original Hamiltonian and the mapped Hamiltonian H^ p . The energy origin (e = 0) 
is chosen at the lowest unoccupied level in H. Each eigen level is drawn as a spike 
with the width of Ae = 0.02 eV. In the DOS of the mapped Hamiltonian, the band 
in the occupied energy region (e < 0) is that of p&, while the band of ps is shifted by 
2i] s = 27.2eV, owing to the term of 2r] s pB in , and is located at the high-energy 
region at 13eV < e < 25eV. As in Fig. Efd), the two DOS profiles agree excellently 
at the bottom of the unoccupied energy region (0 < e < 0.7eV). Here we recall that 
the original and mapped Hamiltonians, from their definitions, share the unoccupied 
eigen states that gives the density matrix of p = 1 — p ([p, H] = [p, H^ p ] = 0) and 
the disagreement in the present result appears only because ps is deviated from the 
exact one. The excellent agreement at the band bottom (0 < e < 0.7eV) appears, since 
these states are contributed dominantly by surface states and are almost free from the 
subsystem p B . 

3.3. Example 2 

The multi-solver scheme was demonstrated in large-scale calculations. The first example 
is reconstructed (001) surface of Si slab with 10 4 atoms, which is determined with the 
force on atoms. The practical procedure is the same as in Sec lH.2^ except the point that 
the subsystem pa is calculated by the Krylov-subspace (KR-SD) method instead of the 
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Figure 3. Multi-solver scheme with automatic assignment of subsystems (pa, Pb)- 
Top (a) and three-dimensional (b) views of a silicon nano-crystal with 4501 atoms. 
Atoms are visible, only if its electron population is dominated by the subsystem pa- 
The figures are drawn in ideal crystalline geometry for eye guide, though the actual 
system is deformed. The sample edges are plotted as lines for eye guide. 

exact diagonalization method. The result shows the correct surface reconstruction. [TE] 
The second example is a MD simulation of a silicon nanocrystal with 4501 atoms. 
The the multi-solver scheme is constructed from the variational Wannier-state method 
for pa and the perturbative Wannier-state method ps- [HI] An external load is imposed 
in the [001] direction and one initial defect bond is introduced by imposing a repulsive 
force on an atom pair. The sample is deformed with external load, the initial defect 
bond and thermal motion but not fractured. The subsystems, pA and pb, are assigned 
automatically during the MD simulation, as explained below; First, all the wavefunctions 
are calculated by the perturbative solver method, in which the weight of the unperturbed 
term Wq^ is defined for each wavefunction <pj. (See Sec. l2.2j) If the weight Wq^ of a specific 
wavefunction is less than 95 % of the averaged weig ht 4 ave) (w { j) < 0.95 ^ avc) ), the 
corresponding wavefunction <pj is assigned into the subsystem pa and is determined by 
the variational procedure. In other words, if the perturbative procedure does not give 
a satisfactory accuracy, the procedure is switched automatically into the variational 
one. The result of the automatic assignment is shown in Fig. EJ in which atoms are 
visible, only if its electron population is significantly contributed from pa- As a result, 
the subsystem pa, treated by the variational procedure, appear mainly near the sample 
edges and in the internal region near the initial defect bond, because these regions are 
significantly deformed and the electronic states in these regions are fairly deviated from 
that in ideal crystal. 

As a technical detail of the MD simulation with the multi-solver scheme, we used 
a fine tuning technique of lattice constant; [SI] In calculations of ideal silicon crystal, 
the equilibrium lattice constant or bond length differs by 2 % between the variational 
and perturbative methods. The difference can cause, in principle, an artificial lattice 
mismatch in the multi-solver scheme and therefore we tuned the bond length, by 
imposing an additional two-body classical potential on an atom pair or bond site, if the 
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atom pair is occupied by a perturbative Wannier state. This fine tuning technique avoids 
the possible artificial lattice mismatch. Although the calculation results without the fine 
tuning (not shown) did not indicate any practical problem among our calculations of 
silicon, we presume that an error of 2 % in lattice constant might be non-negligible in 
several cases. For example, the lattice constant between Si and Ge is different by 4 % 
and the artificial lattice mismatch by 2 % might cause a problem, when a Si/Ge system 
is calculated. 



4. Applications 




Figure 4. MD simulation of liquid carbon; (a) Pair correlation (PC) function 
calculated by the standard eigen-state method with 216 atoms and by the Krylov- 
subspace method with 13824 atoms. In the former calculation, the function is plotted 
only within r < 6.45A, since the simulation cell size is smaller, (b) DOS in a snapshot 
with 13824 atoms, using the Krylov subspace method. The energy origin (e = 0) 
is chosen at the chemical potential, (c) Mean square displacement using the Krylov 
subspace method (KR) or the standard eigen-state method. In the latter method, the 
level-broadening parameter in the Fermi-Dirac function is set to r = O.lcV (EIG1) and 
t = 0.136eV (EIG2). The numbers of atoms are 216 in the main figure and 13824 in 
the inset, respectively. 



4-1. Liquid carbon : a metallic system 

Liquid carbon was simulated with the Krylov-subspace method as a test calculation. 
The cubic simulation cell is used with 216 and 13824 atoms. The density and the 
temperature are set to be p = 2.0 g/cm 2 and T = 6000K, respectively. The time 
interval between MD steps is set to be At = lfs. As technical details, the subspace 
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Figure 5. Cleavage process of a silicon nanocrystal under [001] external load, in 
which a rod indicates a bonding state and a ball indicates an atomic (non-bonding) 
state. (a)-(c):The 3D views of successive snapshots with the time interval of about 0.7 
ps. (d)(e):Top views of the lower cleavage plane, a (001) surface, in the snapshot (b) 
and (c), respectively. The green arrow indicates the cleavage propagation direction of 
the lower cleavage plane, (f) Color samples of the atomic states (balls), which indicate 
the weight of s orbitals (/ S W ); (i)0 < < 0.2 for blue, (ii)0.2 < f^> < 0.3 for cyan, 
(iii)0.3 < < 0.4 for white, (iv)0.4 < fP < 0.5 for green, (v)0.5 < < 0.6 for 
yellow and (vi)0.6 < / s for red. (g) Example of the asymmetric dimer geometry, (h) 
A cleaved sample with 118850 atoms that is calculated by the multi-solver scheme. 
The picture is drawn for the semi-infinite region of y > x. Electronic states, rods or 
balls, are depicted only for the subsystem pa- Note that, in larger samples such as (h), 
the (001) cleavage mode will be fairly unstable, owing to step formations. [T5] 



dimension and the number of atoms in the real-space projection (See |Appendix B ) are 
chosen to be v = 30 and iVpR = 200, respectively. 
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Figure EJa) shows the resultant pair correlation (PC) function for the conventional 
eigen-state method with 216 atoms and for the Krylov-subspace method with 13824 
atoms. The two graphs are indistinguishable, owing to an excellent agreement. In 
Fig. EJb), the DOS is calculated, from the Green's function, by the Krylov subspace 
method with 13824 atoms. In the DOS calculation, the controlling parameters are set 
into a heavier computational cost {y = 300 and N RP = 1000), so as to reproduce the fine 
DOS profile. Since the present Hamiltonian includes only s and p orbitals, the resultant 
DOS is missing in higher energy regions. The imaginary part of the energy (z = E + i r y) 
is chosen at 7 = 0.05eV. The resultant DOS profile in Fig.|3fb) shows the correct feature 
of liquid carbon, as follows; A narrow tt band appears, from E = — 5eV to +5eV, as in 
nanotubes, which can be decomposed the bonding and antibonding bands. The tt bond 
in liquid phase is, however, imperfect and non-bonding (atomic) p states appear as a 
sharp peak near the chemical potential (e ~ 0.6eV). 

Figure Wi c ) shows the resultant mean square displacement (MSD) for the Krylov 
subspace method (KR) and the conventional eigen-state method (EIG1,EIG2). The 
main figure shows a system of 216 atoms by the two methods, while the inset shows that 
of 13824 atoms by the Krylov subspace method. In the eigen-state method, the level- 
broadening (temperature) parameter in the Fermi-Dirac function is set to r = O.leV 
(EIG1) and r = 0.005au = 0.136eV (EIG2), respectively, so as to show that the detailed 
treatment near the Fermi level causes different fluctuation behaviors of the MSD. Since 
the difference in fluctuation behavior is seen even among the two cases of the eigen-state 
method, we conclude that the Krylov subspace method shows satisfactory agreements 
with the eigen-state method for PC function and diffusion constant (the gradient of the 
linear behavior in the main figure of Fig. Efc)). 

4-2. Silicon : cleavage process 

As a practical large-scale calculation, silicon cleavage process was investigated. [13 
EH HI] The Wannier-state method is used, since it is faster than the Krylov-subspace 
method, when, as discussed in Sec. I2.3[ a dominant number of wavefunctions are well 
localized. The number of atoms in the localization region for each Wannier state (N^) 
is assigned to be Nj^ = 20 — 80, which is determined by the residual norm |^ WS) | 2 . The 
resultant density matrix has a spatial spread, in its off-site elements, over regions with 
hundreds of atoms. Particularly, wavefunctions near cleaved regions tend to have a large 
residual norm and the localization constraint on such wavefunctions are automatically 
relaxed to increase the number . We found that such a way of controlling the 
accuracy for microscopic freedoms is crucial for reproducing the surface reconstruction 
on cleaved surface. See Ref. [3T] for details. 

Figure Efa)-(c) shows a silicon cleavage process with the variational Wannier-state 
method. The external load is imposed on the [001] direction, as in our previous 
simulation [15J. The present system, unlike the previous one |15j . does not contain 
any initial defect for 'cleavage seed'. In result, the cleavage starts from two points 
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on the sample edges and two cleavage planes appear. The lower cleavage surface is 
shown in Figs. Efd) and (e). In Fig. a rod (atomic wavefunction) or ball (bonding 
wavefunction) is assigned for each wavefunction, according to the weight distribution 
among atoms. The black rods are the reconstructed bonds that are not seen in the 
initial (crystalline) structure. A ball is assigned for an atomic (non-bonding) orbital, 
localized on an atom site. On the cleaved surface, an asymmetric dimer appears, as on 
a clean (001) surface, with a ball (lone-pair state) on the upper atom, which is shown 
in Fig. EDj?). For quantitative discussion of orbital freedoms, a parameter /W is defined, 
[T5] for a wave function 0j, as 

/„ W = £IW«>I 2 > (30) 

I 

where \Is) is the s orbital at the 7-th atom. For example, /W = 1/4 in an ideal 
sp 3 hybridized state. To visualize the orbital freedom of wave functions, the atomic 
(non-bonding) states are classified by the color of ball, according to the value of /W 
(See the caption of Fig. (HJ). After a bulk (sp 3 ) bond is broken, the corresponding 
wavefunction is stabilized with increasing the weight of s orbitals (/^ > 0.5), which 
results in appearance of red or yellow balls on cleaved surface. 

As a remarkable result, a well-defined dimer- row domain is formed by nine dimers 
in Fig.^Je), in which the tilting freedoms of asymmetric dimers are fixed into the (2 x 1) 
configuration, although the surface energy of the (2x1) surface is higher than that of the 
(4 x 2) surface (See Sec. 12. Hj) . We suggest that the directional anisotropy of deformation 
is caused by the cleavage propagation direction, as indicated by the green arrow in 
Fig. Efe), and gives the ordering of the tilting freedoms into the (2 x 1) configuration. 
We also calculated many other systems (not shown) in different sample geometry, which 
supports the above suggestion. 

Figure Efh) is a larger system simulated by the multi-solver scheme, in which we 
use the variational and perturbative Wannier-state methods for subsystems pa and p#, 
respectively. [TH} EH] The system contains 118850 atoms and the sample dimension is 
Uno x riiio x n QO i = 97 x 100 x 49 in the unit of the atomic layers, where n 110 = 100 
corresponds to about 20 nm. Here the subsystem pa was composed of selected Wannier 
states near fracture regions and the rest part of the electron system is defined as the 
subsystem ps- The number of Wannier states in the subsystem pa is approximately 5 
% of the total and the computational cost by the present multi-solver scheme is nearly 
1/10 of that by the single-solver calculation with the variational procedure. In Fig.Efh), 
the electronic states in the subsystem pa are depicted as rods or balls and those in the 
subsystem p B are invisible. The cleave surface in Fig. Efh) contains (001) surface but 
is fairly unstable with many step formations. [T5| EH] See Refs. |15[ IT7j for the physical 
discussion of the instability. 
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4-3. General discussion on the multi-solver scheme 

Finally, a general discussion is made for a practical application of the multi-solver 
scheme. Among the present examples, the procedure was carried out without a self- 
consistent loop (pb =>■ H^X =>- Pa), as explained in the beginning of Sec. 13.21 The 
present non-selfconsistent procedure is practical, particularly, if the electron system 
can be decomposed into two parts that are governed by stronger and weaker binding 
mechanisms, respectively. In the present examples, the electronic states in the bulk 
part (pb) are governed by a stronger binding mechanism (the sp 3 bonding) than those 
in surface states (pa) and can be well described without any detailed information of 
the surface states (pa)- Another example of the decomposition may be a system with 
strong a bonds and weak tt bonds. The situation of the decomposition is a candidate of 
the multi-solver scheme. When the multi-solver scheme is used, the solver method for 
each subsystem should be chosen from the discussion of Sec. 12.31 

We note that the multi-solver scheme with a self-consistent loop can be realized, in 
principle, and its practical application might be a possible future work. 

5. Concluding remarks 

This paper presents fundamental theories and practical methods for large-scale electronic 
structure calculations, particularly for dynamical process with nm-scale or 10-nm-scale 
structures. First, we presented several practical solver procedures, based on Krylov 
subspace and generalized Wannier state, so as to obtain the density matrix without 
calculating eigen states. We emphasized that every method has a way of accuracy control 
for microscopic freedoms, by monitoring the residuals of exact equations. Second, the 
'multi-solver' scheme was formulated based on the commutation relation of the density 
matrix and was used for a hybrid or combined method of different solver methods. 
Several practical large-scale calculations were carried out in metallic and insulating 
cases. 

These methodologies enable us to design a simulation of nanostructure process 
with an optimal computational cost, in which the accuracy is controlled dynamically 
for microscopic (basis) freedoms and solver methods may be different among different 
regions. These points are essential in nanostructured systems, nm-scale or 10-nm-scale 
systems, because a competition between different regions, such as bulk and surface 
regions, is essential and is required to be reproduced in simulation. Since the above 
requirement is general among nanostructure processes, the present discussion is always 
valid, even when a different system is calculated by a different solver method from those 
in the present paper. 

Acknowledgments 

This work is supported by a Grant-in-Aid from the Ministry of Education, Science, 
Sports and Culture of Japan. Numerical calculation was partly carried out using the 



Large-scale electronic structure theory for simulating nanostructure process 



17 




•;iihs|i.iLL- d V basis number bnsb; number 't'l ba?is number 



Figure 6. Krylov-subspace method for a foe Cu system with 10,800 atoms; (a) The 
convergence behavior of the band structure energy as the function of the subspace 
dimension v [y —2, 5, 10, 20, 30, 60 and 90). A reference value calculated by the 
standard eigen-state method is plotted as a dashed line, (b) The decay behavior of the 
density matrix (Kn^\p K ^\j), as the function of the basis number (n). The subspace 
dimension is set to be v = 90 and the temperature (level-broadening) parameter is set 
to be t = 0.1 eV. The circle and square indicate the values with the staring bases of the 
s and d (e g ) orbitals — \s), \e g }), respectively. (c)(d)Decay behavior of the density 
matrix \p K W) \j) with respect to the basis number of the Krylov subspace (n). The 
circles indicate the result of 10,800-atom system with the real-space projection and the 
crosses indicate the result of 876-atom system without the real-space projection. The 
staring basis (|j)) is chosen to be a d (e g ) orbital. The temperature (level-broadening) 
parameter is set to be r = 0.1 eV in (c) and r = 0.5 eV in (d). 
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Appendix A. Proof of the fundamental equation in the multi-solver scheme 

Here we prove Eq. (|27)l. the fundamental equation in the multi-solver scheme, when 
the subsystems pa, Pb are constructed from Wannier states in Eqs. (}2*2*j) and (J23J) and 
the mapped Hamiltonian H}^L is defined by Eq. (J2SJ)- We notice that the projection 
operator onto the unoccupied Hilbert space, p, is defined as 

p = l- p= l-p A -pB (A.l) 

and satisfies 

Hp = pH. (A.2) 
Equation (}2Tj) is satisfied as follows; 



hL a 2 



map ■ 



PA 



[H, p A ] + 2r/ s [p B , Pa] - [Hp B + PbH, p A ] 
(Hp A - p A H) + - (p B Hp A - PaHpb) 
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= (1 - p B )Hp A - p A H (1 - p B ) 
= (p + pa)Hpa - PaH (p + p A ) 
= pHpA - PaHp 

= HppA - PapH = 0, (A.3) 

where the second equality is obtained by Eqs. (J23|) an d the fourth and sixth equality is 
obtained by Eq. ()A.1|) and Eq. (jA.2j) . respectively. The last equality is obtained by the 
orthogonality relation of ppA = PaP = 0. 

Appendix B. Technical details and numerical aspects of the 
Krylov-subspace method 

Here we discuss several technical details of the Krylov-subspace method and demonstrate 
how the method works, particularly in metals. As a demonstration, a fee Cu system 
was calculated with a periodic simulation cell of 10, 800 atoms. The temperature (level- 
broadening) parameter in the Fermi-Dirac function is set to be r = O.leV. As a practical 
technique, a real-space projection technique is introduced; The Krylov subspace is 
generated by a Hamiltonian projected in real space, H^> = P^'HP^\ instead of the 
original one H, where the projection operator P^) projects a function onto the spherical 
region whose center is located at the atomic position of the j-th atomic basis. The 
resultant Krylov subspace is the same as the original one (JC n (H, = K, n (H^\ 
while the bases lie within the projection region ( (H^) n \j) = H n \j)). Since the 
procedure of constructing the Krylov subspace K. V (H^\ is independent among the 
starting bases (j), all the procedures and the quantities are well-defined with the real- 
space projection technique. The projection radius is determined for each starting basis 

so that a given number of atoms, Arp, should be contained inside the radius. The 
present calculation with 10, 800 atoms was carried out using the projection technique 
with Arp = 381. The calculation without the projection technique was also carried out 
in a smaller (876-atom) system, which is discussed below. 

In Fig. Ufa), the convergence behavior of the calculated band structure energy 
is shown as the function of the subspace dimension, in which a reference value is 
also calculated by standard eigen-state calculation with the standard Brouillion-zone 
integration. The deviation from the reference value is about 0.01 eV per atom for 
v =10, 20 and 30 and less than 1 meV per atom for v = 60 and 90. Since the density 
matrix pjj is calculated in the form of Eq. (JHJ), its representation within the Krylov 
subspace (K^\p K ^\j}(= (K^\p K ^\K[^)) is plotted in Fig. Efb), where the starting 
bases \j) are set to be s and d (e 9 ) orbitals, as examples. In Fig. Efb), we observe a 
1/n or faster decay, and this observation is also seen with the other staring bases (p and 
t 2g orbitals). The decay behavior of Fig. IHfb) is explained by a general mathematical 
analysis of the Lanczos procedure [Ej, in which a 1/n decay should appear with the 
zero-temperature formulation (r = 0) and a faster decay should appear with a finite 
temperature formulation (r ^ 0). The quantity (i\K^'), on the other hand, also decays 
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as 1/n or faster (not shown), since the (normalized) vector \K^) has a spatial spread 
within ra-th hopping range from the starting basis {\K)p) e JC n (H, \j))). Consequently, 
their product ({i\K^'){K^)\p K ^\j)) decays as 1/n 2 or faster, which validates the fast 
convergence in the summation of Eq. (JKJ). 

We should emphasis that the decay behavior in Fig. E^b) comes from a general 
property of the Lanczos procedure, as discussed above, not from the projection 
technique. The above statement is confirmed numerically in Figs. EIc)(d), in which fee 
Cu systems were calculated with or without the real-space projection and the resultant 
decay behavior is affected significantly by the temperature (level-broadening) parameter 
r, but not by the projection technique. 
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